Text-Based Age and Gender Prediction for Online Safety Monitoring

نویسندگان

  • Janneke van de Loo
  • Guy De Pauw
  • Walter Daelemans
چکیده

This paper explores the capabilities of text-based age and gender prediction geared towards the application of detecting harmful content and conduct on social media. More specifically, we focus on the use case of detecting sexual predators who try to “groom” children online and possibly provide false age and gender information in their user profiles. We perform age and gender classification experiments on a dataset of nearly 380,000 Dutch chat posts from a social network. We evaluate and compare binary age classifiers trained to separate younger and older authors according to different age boundaries and find that macro-averaged Fscores increase when the age boundary is raised. Furthermore, we show that use-case applicable performance levels can be achieved for the classification of minors versus adults, thereby providing a useful component in a cybersecurity monitoring tool for social network moderators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Neuro-Fuzzy Based Algorithm for Online Dynamic Voltage Stability Status Prediction Using Wide-Area Phasor Measurements

In this paper, a novel neuro-fuzzy based method combined with a feature selection technique is proposed for online dynamic voltage stability status prediction of power system. This technique uses synchronized phasors measured by phasor measurement units (PMUs) in a wide-area measurement system. In order to minimize the number of neuro-fuzzy inputs, training time and complication of neuro-fuzzy ...

متن کامل

Online Monitoring for Industrial Processes Quality Control Using Time Varying Parameter Model

A novel data-driven soft sensor is designed for online product quality prediction and control performance modification in industrial units. A combined approach of time variable parameter (TVP) model, dynamic auto regressive exogenous variable (DARX) algorithm, nonlinear correlation analysis and criterion-based elimination method is introduced in this work. The soft sensor performance validation...

متن کامل

Online Voltage Stability Monitoring and Prediction by Using Support Vector Machine Considering Overcurrent Protection for Transmission Lines

In this paper, a novel method is proposed to monitor the power system voltage stability using Support Vector Machine (SVM) by implementing real-time data received from the Wide Area Measurement System (WAMS). In this study, the effects of the protection schemes on the voltage magnitude of the buses are considered while they have not been investigated in previous researches. Considering overcurr...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016